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Thanks to instrumental advances, new, very large kinematic datasets for 
nearby dwarf spheroidal (dSph) galaxies are on the horizon. A key aim of these 
5_i ■ datasets is to help determine the distribution of dark matter in these galaxies. 

Past analyses have generally relied on specific dynamical models or highly re- 
strictive dynamical assumptions. We describe a new, non-parametric analysis of 
the kinematics of nearby dSph galaxies designed to take full advantage of the 
future large datasets. The method takes as input the projected positions and 
radial velocities of stars known to be members of the galaxies, but does not use 
any parametric dynamical model, nor the assumption that the mass distribution 
follows that of the visible matter. The problem of estimating the radial mass dis- 
tribution, M{r) (the mass interior to true radius r), is converted into a problem 
of estimating a regression function non-parametrically. From the Jeans Equation 



-2- 



we show that the unknown regression function is subject to fundamental shape 
restrictions which we exploit in our analysis using statistical techniques borrowed 
from isotonic estimation and spline smoothing. Simulations indicate that M{r) 
can be estimated to within a factor of two or better with samples as small as 1000 
stars over almost the entire radial range sampled by the kinematic data. The 
technique is applied to a sample of 181 stars in the Fornax dSph galaxy. We show 
that the galaxy contains a significant, extended dark halo some ten times more 
massive than its baryonic component. Though applied here to dSph kinematics, 
this approach can be used in the analysis of any kinematically hot stellar system 
in which the radial velocity field is discretely sampled. 

Subject headings: Dark Matter; Galaxies: Dwarf; Galaxies: Kinematics and 
Dynamics; Methods: Data Analysis 

1. Introduction 

Despite their humble appearances, the dwarf spheroidal (dSph) satellites of the Milky 
Way provide a source of persistent intrigue. Mysteries concerning their origin, evolution, 
mass density, and dynamical state make it difficult to know where to place these common 
galaxies in the context of standard (e.g. Cold Dark Matter) models of structure formation. 
Are they primordial building blocks of bigger galaxies, or debris from galaxy interactions? 

While dSph galaxies have stellar populations similar in number to those of globular 
clusters [Mhum ~ 10 6_7 M Q ), their stars are spread over a much larger volume (R ~ 2-6 kpc 
compared to 10-50 pc in globular clusters) resulting in the lowest luminosity (i.e., baryonic) 
densities known in any type of galaxy. In many cases it is unclear how these galaxies could 
have avoided tidal disruption by the Milky Way over their lifetimes without the addition of 
considerable unseen mass. This characteristic of dSph galaxies suggests that the dynamics 
of these systems are dominated either by significant amounts of unseen matter, or that these 
galaxies are all far from dynamical equilibrium. 

In general, the Jeans Equations (Equations (4-21), (4-24), and (4-27) of Binney & 
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Tremaine 1987 (1), hereafter, BT87) provide a robust description of the mass distribution, 

M(r), of a collisionless gravitational system - such as a dSph galaxy - in viral equilibrium, 

Equation (5) below. Their general form permits any number of mass components (stellar, gas, 

dark), as well as anisotropy in the velocity dispersion tensor and a non-spherical gravitational 

potential. When applied to spherical stellar systems and assuming at most only radial or 

tangential velocity anisotropy, these equations can be simplified to estimate the radial mass 

distribution (Equation 4-55 of BT87): 

, ,/ x rv r 2 /dlnz/ din v r 2 A . , 

= G~ ( dliir + "dTnr" + ) ' « 

where v is the spatial density distribution of stars, v r 2 is the mean squared stellar radial 

velocity at radius r. The dimensionless isotropy parameter, f3(r), compares the system's 

radial and tangential velocity components: 

= 1-=- (2) 

Apart from the constraints on the geometry and the functional form of the anisotropy, 
Equation (1) is model- independent, making it an appealing tool. It is relevant that Equation 
(1) and (6) below are applicable to any tracer population that in equilibrium and satisfies 
the collisionless Boltzman Equation. 

Kinematic datasets for individual dSph galaxies have historically been far too small 
(typically containing radial velocities for ~ 30 stars; see Mateo 1998) to allow for a precise 
determination of M(r) using relations similar to Equation (1). Instead, authors have been 
forced to adopt additional strong assumptions that reduce the Jeans Equation to even simpler 
forms and where the relevant distributions {v(r) and v 2 (r) in Equation 1) are represented by 
parametric models. Specifically, if one assumes isotropy of the velocity dispersion tensor (i.e., 
(3 = 0), spherical symmetry, and that the starlight traces the mass distribution (effectively 
a single- component King model (Irwin and Hatzidimitriou 1995)), then one obtains for the 
M/L ratio (Richstone and Tremaine 1986): 

- = n 9a ° 2 (3) 

where <To is the one-dimensional central velocity dispersion, Iq is the central surface bright- 
ness, and Rh is the half-light radius. The parameter r\ is nearly equal to unity for a wide 
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range of realistic spherical dynamical models so long as the mass distribution is assumed to 
match that of the visible matter. With this approach - the modern variant of the classical 
'King fitting' procedure (King 1966) - the measured central radial velocity dispersion and 
surface brightness yield estimates of such quantities as the global and central M/L ratios. In 
all eight of the MW's measured dSphs 1 , large central velocity dispersions have conspired with 
their low surface brightnesses to produce large inferred M/L values. This line of reasoning 
has led to a general belief that dSph galaxies are almost completely dark-matter dominated, 
and their halos have assumed the role of the smallest non-baryonic mass concentrations 
identified so far in the present-day Universe. 

This analysis fails for galaxies that are far from dynamical equilibrium, for example due 
to the effects of external tidal forces from the Milky Way (Fleck and Kuhn 2003; Klessen 
and Kroupa, 1998). Numerical models aimed to investigate this (Oh et al. 1995; Piatek and 
Pryor 1995) generally found that tides have negligible effects on the central dynamics of dSph 
galaxies until the survival time of the galaxy as a bound system becomes comparable to the 
orbital time (about 1 Gyr for the closer dSph satellites of the Milky Way). Observations agree 
with this broad conclusion by finding that remote dSph galaxies are no less likely to contain 
significant dark matter halos than systems located closer to their parent galaxy (Mateo et al. 
1998; Vogt et al. 1995). However, so-called resonance models (Fleck and Kuhn 2003; Kuhn 
1993; Kuhn et al. 1996) have been proposed that imply the central velocity dispersions can 
be significantly altered due to the inclusion of stars streaming outward from the barycenter 
of a galaxy and projected near the galaxy core. Recent versions of these models invariably 
imply a significant extension of the affected galaxies along the line-of-sight (more precisely, 
along the line between the center of the dwarf and the Milky Way; Kroupa 1997; Klessen 
and Kroupa 1998) and a massive tidal stream along the satellite's orbit. Observations do not 
reveal strong evidence of significant line-of-sight distortions in dSph galaxies (Hurley-Keller 
et al 1999; Klessen et al. 2003), other than Sagittarius (e.g. Ibata et al. 1995); thus, for the 
purposes of this paper, we will assume that dSph galaxies are generally close to a state of 



: We exclude the Sagittarius dSph, which is unambiguously undergoing tidal destruction (Majewski et al. 
2003). 
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dynamical equilibrium. 

Even with this enabling assumption, the classical analysis of dSph masses as we describe 
it above is far from ideal for a number of reasons. First, though recent work (e.g. Irwin and 
Hatzidimitriou 1995) has helped to greatly improve estimates of dSph structural parameters 
(Jo and Rh in Equation 3), the errors in the velocity dispersions - often dominated by Poisson 
uncertainties due to the small number of kinematic tracers - contribute the principle source of 
uncertainty in M/L estimates. Second, there is little reason to suppose the assumption that 
mass follows light in dSphs is valid. In all other galaxies the bulk of the matter resides in a 
dark halo extending far beyond the luminous matter, a trend that becomes more exaggerated 
toward smaller scales (Kormendy and Freeman 2004). Finally, velocity anisotropies, if they 
exist, may mimic the presence of dark matter, and so represent a tricky degeneracy in the 
model even if the assumption of isotropy is dropped. 

Modern instrumentation is poised to deliver dramatically larger kinematic datasets to 
help minimize the first problem. For example, the Michigan/Magellan Fiber System, now 
operational at the Magellan 6.5-m telescopes, obtains spectra from which high-precision 
radial velocities can be measured of up to 256 objects simultaneously. As a result, it is now 
feasible to obtain thousands of individual stellar spectra in many dSph systems, enlarging 
sample sizes by more than an order of magnitude. This not only reduces the statistical 
uncertainty of the dispersion estimation, but can also provide information on the spatial 
variation of the dispersion across the face of a galaxy. These rich datasets therefore allow 
for fundamentally improved results even using fairly conventional analysis techniques. For 
example, by parameterizing the velocity anisotropy, Wilkinson et al. (2002) and Kleyna et 
al. (2002) show that samples of ~ 200 stars can begin to break the degeneracy between 
anisotropy and mass in spherical systems. 

But these large datasets also allow us to aim higher. In this paper, we introduce and 
develop the formalism for a qualitatively different sort of analysis designed to make the most 
efficient use of large kinematic datasets. Rather than adopting a model that parameterizes 
the various distributions used in the Jeans equation (e.g., M(r), u(r), v%, or /3(r)), we operate 
on the star count and radial velocity data directly to estimate the mass distribution non- 
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parametrically. We estimate the true three-dimensional mass distribution from the projected 
stellar distribution and the line-of-sight velocity distribution. In this first application of our 
technique, we still require the assumptions of viral equilibrium, spherical symmetry, and 
velocity isotropy (/3 — 0). In Section 2, we introduce notation and definitions of the three- 
dimensional as well as projected stellar and phase-space densities and review the Jeans 
Equation. In Sections 3 and 4, we show how to use very general shape constraints on the 
mass distribution to estimate the detailed form of the velocity dispersion profile and the 
radial mass distribution. We illustrate the process on simulated data, demonstrating that, 
when furnished with large datasets, nonparametric analysis is a powerful and robust tool 
for estimating mass distributions in spherical or near-spherical systems (Section 5). We 
illustrate the application of our approach to an existing - but relatively small - kinematic 
dataset for the Fornax dSph galaxy at the end of Section 5. 

Our emphasis in this paper is the application of our technique to dSph kinematics, but 
the method described in this paper is applicable to any dynamically hot system in which 
the radial velocity field is sampled discretely at the positions of a tracer population. Thus, 
our methodology would work for, say, samples of globular clusters or planetary nebulae 
surrounding large elliptical galaxies, or for individual stars within a globular cluster. Our 
approach follows earlier non-parametric analysis of globular cluster kinematics by Gebhardt 
and Fischer (1995) and Merritt et al. (1997), though the details of our method differ signif- 
icantly. 



Let X = (Xi, X 2 , X 3 ) and V = (VijVz, V3) denote the 3-dimensional position and 
velocity of a star within a galaxy. We will regard these as jointly distributed random vectors, 
as in BT87, p. 194. Suppose that X and V have a joint density that is spherically symmetric 
and isotropic, so that 



2. Jeans' Equation 




(4) 
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where r 2 = x\ + x\ + x\ and v 2 = v\ + t>| + v\. We also suppose that V has been centered 
to have a mean of zero so that J vf (r,v)dxdv = 0. Suppose finally that the mass density 
p(r) is spherically symmetric and let 

M(r) = 4vr / t 2 p{t)dt, (5) 
Jo 

the total mass within r of the center. The goal of this paper is to come up with a means 
of estimating M(r) non-parametrically, specifically without assuming any special functional 
form for f (r,v) or p(r). In the presence of spherical symmetry, the relevant form of the 
Jeans Equation is 

M(r) = -^^^log[/(r)A*(r)], (6) 

where 



K r ) = J v fo(r,v)dv (7) 

and 

f( r ) = J fo{r,v)dv. 

In statistical terminology, /i(r) is the conditional expectation of V 2 = (V 2 + V 2 2 + V 2 ) /3 given 
x (this is the same as the conditional expectation on V 2 , hence the factor of 3), and the 
marginal density of X is f(r). In astronomical terms, /i(r) is the one-dimensional radial- 
velocity dispersion profile squared, and /(r) is the true, three-dimensional density profile 
of the tracer population. Using the notation of BT87, these correspond to v 2 and ^(r), 
respectively. 

To estimate M(r) from equation (6) clearly requires estimates of the functions / and 
fi. For dSph galaxies, the data available for this consist of a large sample of positions and 
photometry of stars in individual systems (Irwin and Hatzidimitriou 1995; hereafter IH95), 
and much smaller samples of stars with positions and velocities (e.g. Walker et al. 2004). Of 
course, it is not possible yet to determine complete three-dimensional positional or velocity 
information for any of these stars, but only the velocity in the line of sight and the projection 
of position on the plane orthogonal to the line of sight. With a proper choice of coordinates, 
these observables become Xi,X 2 , and V3. These can be made equivalent to, say, right 
ascension and declination, and radial velocity, respectively. The incomplete observation of 
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V does not cause a problem here, given the assumed isotropy, since v 2 can be replaced by 3v 2 
in Equation (7) without changing the value of fi(r). The incomplete observation of position 
poses a more serious problem that is known as Wicksell's (1925) Problem in the statistical 
literature. The procedures for estimating / and /i are consequently different; they will be 
considered separately in Sections 3 and 4. 



3. The Spatial Distribution of Stars: Estimating / 

Let R = X 2 + X\ + X 2 and S = a/ X\ + X 2 denote the true three-dimensional dis- 
tance of a star and its two-dimensional projected distance from the origin, respectively. We 
denote the corresponding densities as /r and g§- Then fn{f) — 47rr 2 /(r), and 

. , A f°° f(r)rdr 
For an example, consider Plummer's distribution, 



l + \r 2 2 



(9) 



where Cq is a constant, and b is a parameter that is related to the velocity dispersion through 
E(V 2 \R = r) = 6/[2yT + r 2 /3]. Here r is measured in units of 100 pc and v in km/s. 
We also employ here the notation [x} + to denote taking the larger of x and 0. With this 
definition, we take [x]J_ to be shorthand for + that is, the value of x is either positive 



or zero before raising to the power a. For this case, fn(r) = r 2 /[y/3^ (1 + r 2 /3) 5 ] and 
9s{ s ) — 2s/[3(l + s 2 /3) 2 ]. We use the Plummer distribution in our simulations below. 

IH95 provide data from which / can be estimated. These data consist of a sample 
of stars with projected radii S\, - ■ ■ , Sn and counts Nk of the number of stars for which 
rfc_! < S < Tfc where = r < r\ < ■ ■ ■ < r m divide the stars into bins. Let G§ denote the 
distribution function of S, so that Gs(s) = P[S < s] = J Q S g s (t)dt. Then Gs(r k ) may be 
estimated from its empirical version 
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In this context and throughout this paper, the # symbol denotes functions estimated directly 
from empirical data. Since the samples of stars used by IH95 to determine the surface density 
of most of the Milky Way dSph galaxies are quite large, there is generally very little statistical 
uncertainty in our estimate of G$ from Equation (10). If we now model / by a step function, 
say /(r) = fk for r^-i < r < r\~ and k — 1, • • • , m, then /i, • • • , f m may be recovered directly. 
In this case, the integral in Equation (8) is easily computed leading to 

m 

1 - Gs(rj_i) = ^2 a jkfk, 

k=j 

where 




Then f\ , ■ • ■ , f m may be recovered from 

f j = —[l-G B (r j - 1 )- J2 a ikfk]. (11) 
n k=j+i 

The summation is to be interpreted to be zero when j = m, and the right side of equation 
(11) can in practice be estimated by replacing Gs with its empirical version Gg. 

The results of applying this method to a sample of 150,000 positions drawn from a 
Plummer model with b = 200 x 3.1 x 10 15 km 3 / sec 2 are shown in Figures 1 and 2. Figure 
1 shows simulated counts of the number of stars in equally-spaced bins ranging from = 
?"o < f\ < • ■ • < rso = 16 x lOOpc . The estimate of / from these simulated data is shown in 
Figure 2. In both figures the dotted line denotes the true function. 

The estimation of / follows Wicksell's (1925) original analysis and can almost certainly 
be improved. We have not done that here, because the sample sizes are so large, but related 
questions are currently under investigation by Pal (2004). 

4. Velocity Dispersion Profile: Estimating n and M 

For estimating \x and M it is convenient to work with squared radius Y = S 2 = X 2 + X 2 , 
as in Groeneboon and Jongbloed (1995) and Hall and Smith (1992). The joint density of 
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Fig. 1. — Simulated projected counts for 150,000 stars drawn from a Plummer model. The 
histogram shows the distribution of selected stars from this model; the dotted line shows the actual 
distribution defined by the model. 
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Fig. 2. — The function / derived from the counts shown in Figure 1 derived using Equation (11) 
and for a sample size of 150,000 positions derived from a Plummer model. The true distribution is 
shown as a dashed line. 
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(Y, V) is then 



fo(Vz,v) . . 

g Y ,v(y,v) = 7T dz. 12 



il 



Relation (7) represents a special case on integrating over all three velocity components and 
using gs(s) = 2sg Y (s 2 ). Now let 



» = J vlf {y/z,v)dv, 



and 



= J vlg Y> v(y,v)dv, 

so that ip{y) / ' gy{y) is the conditional expectation of V 3 2 given Y — y, and similarly for <ft. 
We also define the important function 

J y y/t-y 

Then, reversing the orders of integration and recognizing a beta integral (see BT87, p. 205), 
we can write 

POO 

V(y)=ir 2 <t>{z)dz (13) 

J V 



and 



7T 2 



where the prime notation (') denotes the derivative. Clearly, f(r)fi(r) = 0(r 2 ) in Equation 
(7). Thus, the expression for M(r) from the Jeans equation (Equation (6)) can be rewritten 

as 

M < r) = G7'JW- (15) 

Our estimator for / was obtained in Section 3, so we focus here on determining ty". 
For our problem, we benefit from noting that the function \I/ is subject to certain shape 
restrictions which are useful in the estimation process. From Equation (13) we can see that 
\l/ is a non-negative decreasing function, while from Equation (15) it is evident that \l/ has 
a non-negative second derivative (that is, it is upward convex). In fact, r 3 \l/"(r 2 )/ f(r) must 
be an increasing function that vanishes when r = 0, physically consistent with its direct 
proportionality to M(r) in Equation (15). 
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The next step in the estimation of M(r) is to estimate Suppose that there is a 
sample (X^i, X it2 , Ui), i = 1, • • • , n of projected positions (X^i, -Xj i2 ) and radial velocities Ui 
measured with known error, Thus, Ui = V^z + e«, where where e±, • • • , e n are independent 
of (X^i, Xj^, ^,3), i = 1, • ■ • , n with zero mean values and known variances, of, • • • , o^. 
In practice, there may be selection effects: An astronomer may choose to sample some 
regions of a galaxy more intensely than others, or external factors (weather, moonlight, 
telescope/instrument problems) may cause undersampling of some regions. To address this, 
we must include a selection function, wo(xi,X2), into the model. Here, wq(xi,X2) is the 
probability that a star in a galaxy is included in the kinematic sample, given that it is there. 
Thus the selection probability is assumed to depend on projected position, (xi, x 2 ), only. As 
before, let Yj = Xf tl + Xf 2 ] then the joint density of Yi and V i in the sample is 

9y,v(Vi v ) = w(y)g Y ,v(y, v), (16) 

where 

1 r 

w(y) = - wo{y/ycos(9),y/ysm(6)]d9, 

gY,v is as in Equation (12 ), and c is a normalizing constant, determined by the condition 
that the integral of Jyv^i v ) be one. Integrating over v in (16), the actual density of Y is 

&{y) = w(y)g Y (y), (17) 

and c is determined by the condition that J g Y {y)dy = 1. Here g Y may be determined from 
the complete photometric data for a given galaxy to give the projected stellar distribution 
as a function of Y. If / is a step function, then combining equations (8) and (11) with the 
relation gs(s) = 2sg Y {s 2 ) leads to 



Shr(y) = 2vr 



m / v 

k=j+i ^ ' . 

in the interval r|_ x < y < r| for j — 1, ■ ■ ■ , m. If an astronomer can specify wo(x\, x 2 ), then 
c and w(y) can be computed directly using Equation (15). Then 
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is an unbiased estimator of ^(z) for each z; that is, = ^(2) for each z. 

If an astronomer cannot specify wq, then it must be estimated from the data. The first 
step is to estimate g^, which can in principle be done in many ways. The simplest is to use 
a kernel estimator of the form 



1=1 N 



where K is a probability density function, called the kernel, and b is a positive factor called 
the bandwidth that may be specified by the user or computed from the data. Silverman 
(1986) is a recommended source for background information on kernel density estimation. 
Possible choices of K and the computation of b are discussed there. Other methods for 
estimating g Y include local polynomials [79-82, Loader 1999] and log-splines [178-179, Gu 
2002]. Whichever method is adopted, once g Y has been estimated, w may be subsequently 
estimated by 

9Y{y) 

Then \1/ can be estimated by replacing w with w in (18). In this case \&# is no longer exactly 
unbiased, but it is at least consistent in the statistical sense that as the number of data 
points increases, the estimator continuously tends ever closer to the true value. 

Generally, \I/*(z) does not satisfy the shape restrictions when viewed as a function 
of z: ^>#(z) is unbounded as z approaches any of the data points Y{. It is decidedly not 
monotone nor convex (see Figure 4). For these reasons may be called the naive estimator. 
This naive estimator can be improved by imposing physically justified shape restrictions on 
M(r). One can imagine a wide range of possible model forms for M(r), but in the spirit of 

attempting a non-parametric solution, we adopt a simple spline of the form 

in 

M(r) = ]T>[r - r^ft, (20) 

i=l 

where = r < r 1 < ■ ■ ■ < r m are knots, • • • , /3 m are constants, and x\ = [max(0, x)] p . 
The simplest case is for p — 1, but this form for M(r) leads to a divergent estimate of 
the central velocity dispersion, /i(0). For p = 3, M(r) ~ Z^r 3 as r — > 0; thus, 3/?i/47r 
provides an estimate of the central mass density with the correct asymptotic behavior at the 
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galaxy center. Unfortunately, translating the shape restrictions on M(r) into conditions on 
flx, • • • , /3 m is much more difficult when p = 3 than for p < 2. For this paper we have chosen 
to compromise with p = 2 because it leads to a good estimate of fi(0) yet is still amenable 
to the application of the shape restrictions we described above. We plan to tackle the case 
p = 3 in a future paper. 

For our adopted case (p = 2) the expression (Equation (20)) for the mass, M(r), is a 
quadratic spline: 

m 

M(r)=^A[r-r,_4. (21) 

i=l 

One natural constraint is that M(r) remain bounded as r — > oo. Expanding Equation (21) 
to read 

EftK 2 -2 E M-i r + E & r *-i> ( 22 ) 

i=l / \i=l / i=l 

for r > r m _i, we see that this constraint is equivalent to requiring that J^Ili A = 0' an d 
X]^iA r i-i = 0. From this first equality constraint (XX=iA = 0)> we can trivially say 
Pm = - YJi=\ A an d' therefore, 

m— 1 

M(r) = b{V ~ - [r - r ro _ift} . (23) 

i=l 

We also require that M(r) be a non- decreasing function of r (no negative mass), or equiv- 
alently that the derivative M'(r) be non-negative. It is clear from Equations (21) and (23) 
that M'(r) is a piecewise linear function that is constant on the interval r m _i < r < r m . 
Thus, the condition that M(r) be non- decreasing is equivalent to requiring that M'(rj_i) > 
for i — 1, ■ ■ • , m — 1. That is, 

5>(ri - tvi) > (24) 
i=i 

for j = 1, • • ■ , m—1. For the case j = m—1, this constraint can be written as Yl^^ A( r m-i — 
rj_i) = 0. This is clear from noting that we do not change this summation if we replace m — 1 
with m; thus we can write YlT=i A( r m-i ~ r i-i) = Y^iLi A( r m-i — fi-i), where we have used 
the facts that A) r m-i = and that X)iiiA r i-i = as required by Equation (21). 

These are the constraints imposed on j3\, • ■ ■ , j3 m -\ that we employ below to estimate M(r). 
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If we solve for \P" in Equation (15), we can write 

i=i 



where 



For notational convenience, we define y = r 2 , then solve for \]/(r 2 ) = ^(y) in terms of the 
coefficients fa to get 

/•oo m— 1 

*(y)=/ (t-y)* /, (t)rft = ^Ar 4 (i/), (25) 



8=1 



where 

ri(j/)= / (t-y) lt (t)dt 



for i = 1, • • • , m — 1. 



The next step is to impose the shape restrictions implicit in Equation (15). If 
were square integrable, this could be directly accomplished by minimizing the integral of 
— ^J 2 , or equivalently by minimizing 

k = - V(t) 2 dt- V#(t)V(t)dt (26) 
2 Jo Jo 

with respect to the coefficients, The function is not integrable, essentially because 
Jjj 1 c&r/a; = oo, but k can still be minimized. Letting (3 be the vector (3 = [fli, ■ ■ ■ ,/3 m -i] T , 
this criterion may be expressed in matrix notation as 

K= l -fQP-(3 T ^ (27) 

where the elements of the vector z are given by 

/"OO 

Zi= ^*(t)Ti(t)dt, 

Jo 

and the elements of Q are given by 

POO 

qij = / Ti^Tj^dt 
Jo 

for i, j = 1, ■ ■ • ,m. Thus, the estimation problem for \1/ leads to a quadratic programming 
problem of minimizing k in Equation (27) subject to condition (24) with equality when 
j = m — 1. 
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If we can determine z% and g^, we can estimate the coefficients $ which in turn gives 
us our estimate for M(r). Observe that the Zi depend explicitly on both the deduced stellar 
density distribution, /, and the velocity dispersion profile derived from a set of radial velocity 
tracers, ji. The g^- depend only on /(r). Let tyf be defined as 



(z-y)^*(y)dy 



- 



4 3 1 



(28) 



where the weighting function, w, may be specified or estimated as described earlier in this 
section. Then z$ can then be computed numerically as 



z, 



^f{y)ii{y)dy. 



(29) 



This form for the Zi is simpler to evaluate than the double integral form for these terms given 
earlier. In a similar fashion we can write qij as 



oo ^oo poo 
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These integrals (29) and (30) can be computed explicitly and Matlab codes are available at 
website http://www.stat.lsa.umich.edu/~wangxiao to do so. 

If we define (3 as the solution to the quadratic minimization problem (see Equation 
(27)), our estimates of ^f(y), &'{y), M 7 ")' an d M(r) can be denoted using analogous notation 
and are given by 
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and, 
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M(r) = J2M[r- r k ^f + - [r - r m ^f + } . 
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The latter two functions specify the projected radial velocity and mass profiles, respectively 
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Fig. 3. — A plot of the (S, U) pairs used in one Monte Carlo realization described in Section 5.1. 



5. The Mass Profile: Simulations and a Trial Application 



5.1. Monte-Carlo Simulations 



To illustrate an application of our approach, we have drawn a sample of 1500 (S, U) 
pairs from a Plummer model with b = 200 x 3.1 x 10 15 km 3 /sec 2 and of = 1 (see Figure 
3). The naive estimator may be computed from these data using Equation (18) and 
it is shown in Figure 4, along with the improved estimator ty. The need for the improved 
estimator for \l/ is clear since, as expected, is a highly irregular function which reacts 
strongly to the presence of individual data points. The estimated velocity dispersion profile, 
jl(r) is presented in Figure 5 and the estimated mass profile, M(r), is illustrated in Figure 
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r (x 1 2 pc) r(x10 2 pc) 

Fig. 4. — (Left) The 'naive' estimator as denned by Equation (18) and derived for the Monte- 
Carlo simulation described in Section 5.1 and for the (S,U) pairs shown in Figure 3. (Right) 
The improved estimator ^ derived from imposing the shape restrictions on the form of M(r) (see 
Equations (20) and (21)) for the same simulated data set. In both figures the solid line is the 
estimator and the dashed line is the true distribution calculated from the underlying Plummer 
model. 

6. In all cases, the dotted line represents the true functions computed directly from the 
Plummer model from which the (S, U) pairs were drawn. In these figures n = 1500, m = 30, 
and w = 1. 

The last plataux on the right in Figure 6 (r > 14) deserves comments, because it 
illustrates an important feature of the problem in general and our method in particular: 
Estimates become unreliable for large values of r, because the data become sparse, and 
shape restricted methods are especially prone to problems such problems-e.g. Woodroofe 
and Sun (1993). Deciding exactly where reliability ends is difficult, but the confidence bands 
of Figures 7 and 9 provide some indication. In the example of Figure 6, they correctly predict 
that reliability deteriorates abruptly for r > 14 and, interestingly, make the same prediction 
for a range of sample sizes. 
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r(x 10 2 pc) 

Fig. 5. — The radial velocity profile, n(r), derived from the Monte-Carlo realization based on the 
Plummer Model described in Section 5.1. The actual profile calculated from the Plummer Model 
is shown as a dashed line. Our estimator, /t(r) is shown as a solid line. 

5.2. Assessing the Estimation Error 

There are several sources of error in our derived velocity dispersion and mass profile esti- 
mators which may be divided into the broad classes of modeling error, systematic error, and 
statistical error. While quite general, our assumptions also represent some over-simplification 
and these too introduce uncertainty to our results. The assumption of spherical symmetry, 
for example, ignores the known ellipticities seen in most dSph galaxies (IH95, Mateo 1998). 
But even if we assume our working assumptions were all correct, systematic error will develop 
from our approximations of distributions as step functions, piecewise linear functions, and 
quadratic splines. On general grounds, this error will be small provided that the underlying 
true functions are smooth, so we feel justified in ignoring this source of systematic error. 
We have also ignored the statistical error in the estimation of / in Section 3, because the 
sample sizes in IH95 are so large. That leaves the statistical error in the kinematic data as 
the principle source of uncertainty in our estimation of \& and the estimators of \i and M. 

To quantify the magnitude of this error, let e(r) = M(r) — M(r) denote the error com- 
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Fig. 6. — The estimator of the mass distribution, M(r) for the Monte-Carlo simulation based on 
a Plummer model described in Section 5.1. The piecewise nature of the estimator (see Equation 
(20) is apparent. The dashed line is the true mass distribution for the underlying Plummer model. 

mitted when estimating M(r), and let z a (r) denote the 100a th percentile of its distribution 
distribution, so that P[e{r) < z a (r)\ = a. Then 

P[M(r) - zi- a (r) < M(r) < M(r) - z a (r)] = 1 - 2a, 

and [M(r) — Z\- a (r), M{r) — z a {r)\ is a 100(1 — 2a) percent confidence interval for M(r). 
Within the context of Plummer's Model, the z a (r) can be obtained from a Monte Carlo 
simulation. If we generate Nmc samples from Plummer's Model, we obtain Nmc values 
for the errors e(r), which we denote as ei(r), ■ ■ • , ejv(r). Then z a (r) may be estimated by 
the value of z for which the Na of the ej(r) are less than or equal to z. We applied this 
procedure to a grid of r values with Nmc — 2000 and a = 0.025 and then connected the 
resulting bounds to in a continuous piecewise linear fashion to obtain the 95% and 68% 
confidence bounds shown in Figure 7. The 95% and 68% confidence bounds for \x shown in 
Figure 8 were obtained in a similar manner. 

The procedure just outlined requires knowing the true M{r) and the distributions of Y 
and V3 from the Plummer model. But the exercise is still useful since it shows the intrinsic 
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Fig. 7. — A plot of the 95% (solid lines) and 68% (inner dot-dashed lines) confidence bounds on 
the mass distribution, M(r) (see Figure 6). The actual mass distribution from the Plummer model 
is shown as a dashed line. The strong positive bias at large r (especially for M{r)) is evident for 
r > 14 where the number of data points constraining the mass distribution is very low (about 1% 
of the entire sample lies outside r ~ 14). 

accuracy of the estimation procedure. For example, when the sample size is 1500, the 
estimator of M(r) has very little accuracy for r > 14. Somewhat surprisingly, the estimator 
for M(r) retains reasonable accuracy for r < 14 despite the lack of much data beyond r = 14 
(see Figure 1). We have also explored how our estimates fare with significantly smaller 
samples of tracers with known positions and velocities (i.e., fewer (S,U) pairs). Figure 9 
shows 95 percent confidence bounds for the estimates of M(r) and /x(r) based on sample 
sizes 100, 400, 700 and 1000 stars. As one might expect, the confidence bounds expand 
as the sample size decreases, but, interestingly, the radial location of where the estimator 
performs poorly only very slowly migrates to smaller values of r as the sample shrinks. 

The requirement that we know M(r) and the distribution of Y and V3 ahead of time in 
order to estimate the confidence bounds can be avoided by using a bootstrap procedure. In 
its simplest form, the bootstrap can be implemented as follows: Given a sample of (Y, V3) 
values (corresponding to the observable projected position and radial velocity for each star 
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Fig. 8. — The 95% (solid lines) and 68% (dot-dashed lines) confidence bounds on the velocity 
dispersion profile estimator, /t(r), along with the actual dispersion profile for the Plummer model 
(dashed line). 

in a kinematic sample), first compute M(r). Then generate iV& samples with replacement 2 
from the given sample and compute estimators M*(r), • • ■ , M^{r) for each of these samples. 
Let e*(r) = M*(r) — M(r) and estimate the percentiles of e*(r) as described above. Letting 
z*(r) denote the estimated 100a th percentile, \M{r) —z\_ a {r), M(r) — 5*(r)] is the bootstrap 
confidence interval for M(r). For more details, see Efrom and Tibshirani (1993) who provide 
a good introduction to bootstrap methods. This approach offers a practical way of assessing 
the statistical errors in (x{r) and M(r) of an actual kinematic sample. 

There is also the possibility of obtaining large-sample approximations to the error dis- 
tribution. From (29), it is clear that the Z{ are approximately normally distributed in large 
samples. This suggests that the distribution of f3 should be approximately the same as the 

2 Drawing from a sample 'with replacement' means that for an original sample of m objects, one selects a 
new sample of m values from the original values mj . The new sample differs from the original in that each 
value uii is returned to the sample before choosing the next value mj+i. In this way, some values from the 
original sample will invariably be chosen more than once to form the bootstrap sample. 



Fig. 9. — The 95% (solid lines) and 68% (dot-dashed lines) confidence bounds for different sample 
sizes of stars (from top, for 100, 400, 700 and 1000 stars) derived from a Plummer Model using 
Monte Carlo methods. The lefthand figures show the confidence bounds for the mass distribution 
estimator, M(r), and the righthand panels show the confidence bounds for the velocity dispersion 
profile estimator, fi(r). In each panel, the dashed line is the true distribution for the Plummer 
model. 
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distribution of a shape-restricted estimator in a normal model. Unfortunately, calculating 
the latter is complicated. Some qualitative features are notable, however. Monotone re- 
gression estimators are generally quite reliable away from the end points, but subject to 
the spiking problem near the end points. For non-decreasing regression, the estimator has a 
positive bias near large values of the independent variable (in this case, r), and a negative 
bias at small values of the independent variable (i.e. for r ~ 0). The positive bias at large r 
is very clearly seen in Figure 7 and is exacerbated by the sparsity of data. The negative bias 
near r = is not evident because it is ameliorated by two effects. First, the mass distribution 
is known to be always positive (M(r) > 0) and goes to zero at r = (M(r = 0) = 0). This 
eliminates bias at r = 0. Second, our assumption (23) requires M(r) to be of order r 2 for 
small r, while M(r) is - on physical grounds - known to approach zero as r 3 for small r. 



5.3. Application of the Method: The Fornax Dwarf Spheroidal Galaxy 

As an application to real observational data, we have use the radial velocity data from 
Walker et al (2005) for the Fornax dSph galaxy. This dataset consists of velocities for 181 
stars observed one at a time over the past 12 years (Mateo et al. 1991 has a description 
of some of the earliest data used here). Data from newer multi-object instruments are 
not included here (see Walker et al 2005b for a first analysis of these newer results). Full 
details about the observations can be found in the papers referenced above; for the present 
application we simply adopt these results along with their quoted errors to see what sort of 
estimator for M(r) we can extract from these data. From the simulations we already know 
that the dataset is relatively small for this method, but these simulations also imply that we 
have no reason to expect a strong bias in our mass estimator due to the sample size, only a 
potentially large uncertainty (at least a factor of 2) in the final mass estimate. 

The histogram of the star counts from (Irwin and Hatzidimitriou 1995) is shown in the 
first panel of Figure 10. The remaining two panels in Figure 10 show the derived estimates 
for the spatial density of stars in Fornax f(r) and the true projected radial distribution of 
stars in Fornax g^(y). The bottom panels of Figure 11 give the estimated projected radial 
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distribution of stars from (Walker et al, 2004) and the estimated selection function u(y) 
derived from the data. Figure 12 show the mass profile, M(r) and the squared velocity 
dispersion profile, p,(r). 

Previous estimates of the Fornax mass have been based primarily on the classical, or 
'King' (1966) analysis outlined in the introduction of this paper, and give Mp ornax range from 
6 to 10 x 10 7 M (Mateo et al. 1991; Walker et al. 2005a). These models resemble truncated 
isothermal spheres and implicitly assume spherical symmetry, velocity isotropy, that mass 
follows light, and a specific parametric form for the joint density function fo(r, v). Our mass 
distribution estimator provides an independent measurement employing neither of the latter 
two assumptions. If one believes the second plateau seen at r > 1.5 kpc in Figure 12 (first 
panel) is a real feature, this would indicate a Fornax mass of 3.5 x 10 8 M o . However, we 
note that the simulations using Plummer models (Figure 9) show similar plateaux at large 
radii related not to the underlying mass, but rather to the scarcity of remaining data points. 
The first plateau (0.8 < r < 1.3 kpc), by contrast, covers a region for which the Plummer 
simulations indicate even a relatively small dataset can yield a reasonable estimate of the 
mass. We therefore consider the two plateaux in the M(r) curve of Figure 12 to bracket 
the region of plausible Fornax mass as measured from anN ~ 200 sample by the estimation 
technique. The resulting interval of 1.0 x 1O 8 M <Mj7 WTO!I < 3.5 x 10 8 M is of considerably 
larger mass than the results of the classical analysis. 

The estimation approach then offers an attractive alternative to variants of the classical 
analysis that require parametric dynamical models to interpret kinematic data for dSphs. 
Our simulations show that this new non-parametric analysis will provide significantly more 
accurate results as kinematic samples grow larger. 

Finally, we can make some comments regarding the effect of velocity anisotropy in our 
results. If we consider an extreme case where the outermost bins of the Fornax velocity 
dispersion profile are dominated by stars in tangential orbits (that is, vq 2 » v r 2 ; see 
Equations (1) and (2)), then the outer bins give us the mass directly (from Equation 1) 
as M(r) ~ rvg 2 /G, or 10 8 M , which is equivalent to the lower limit provided by the mass 
estimation technique. Thus, even if we consider an extreme breakdown of the isotropy 
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assumption, the data alone support a larger mass than the simple King analysis. Since it is 
more likely that the velocity distribution has intermediate anisotropy (Kleyna et al. 2002), 
we conclude that the the mass of Fornax to the outermost measured data point is between 
1.0 and 3.5 x 10 s M Q For an observed luminosity of 1.5 x 10 7 L for Fornax, this gives a global 
M/L ratio ranging from 7 to 22. Thus, even the most luminous, most baryonic-dominated 
dSph satellite of the Milky Way is dominated by dark matter. 

6. Discussion 

We have presented a new method for estimating the distribution of mass in a spherical 
galaxy. Along the way, we have made some choices, generally preferring the simplest of sev- 
eral alternatives-for example, the use of quadratic splines in Section 4 and simple histograms 
in Section 3. In spite of this, the algorithm is a bit complicated. Is it really better than sim- 
pler methods-for example, using kernel smoothing to estimate the distribution of projected 
radii and line of sight velocities and then using inversion to estimate M(r)? Our method 
differs from the simpler approach through its essential use of shape restrictions. From a 
purely statistical viewpoint, the monotonicity of r 3 ^"(r 2 )/ f(r) in Equation (15) provides 
a lot of useful information [Robertson, et.al. (1988)]. Early in our investigations, we tried 
the kernel smoothing outlined above and found that the resulting estimators did not satisfy 
the shape restriction, leading to negative estimates of the mass density. Imposing the shape 
restrictions directly guarantees non-negative estimates, at least. It also reduces the impor- 
tance of tuning parameters, like the bandwidth in kernel estimation or the number of bins 
in our work. Imposing the shape restrictions also complicates the algorithm and leaves a 
quadratic programming problem at the end. We think that the game is worth the candle. 

There is some similarity between our method and that of Merritt and Saha (1993) in 
that both use basis functions, splines in our case, and both impose shape restrictions, the 
condition that / > in theirs. Our method makes more esssential use of shape restrictions 
and treats the inversion problem in greater detail. 
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Fig. 10. — Observed and derived distributions from projected counts data in (Irwin and Hatzidim- 
itriou, 1995). Top: Histogram of counts from IH95. Bottom: The inferred three-dimensional dis- 
tribution of stars in Fornax assuming spherical symmetry, f(r); The projected radial distribution 
of stars in Fornax, §Y(y). 
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Fig. 11. — Observed and derived distributions from the kinematic dataset for the Fornax dSph 
galaxy (Walker et al. 2004). Top: The distribution of radial velocity /projected radial positions for 
the stars in the sample; Bottom: The projected radial distribution of stars in Fornax, g^iy)', The 
estimated selection function lj; 




Fig. 12. — The mass distribution estimator for Fornax, M(r); The radial velocity dispersion profile 
estimator, /t(r) of stars 



